{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "q4WF3l23pumU"
      },
      "source": [
        "##### Copyright 2020 The AdaNet Authors."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "cellView": "both",
        "colab": {
          "test": {
            "output": "ignore",
            "timeout": 900
          }
        },
        "colab_type": "code",
        "id": "Kic2quJWppmx"
      },
      "outputs": [],
      "source": [
        "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "rnYmxsCKTS2W"
      },
      "source": [
        "# AdaNet ModelFlow Tutorial\n",
        "\n",
        "AdaNet ModelFlow is a framework to define end-to-end AutoML workflows. In this tutorial, we will outline a sample workflow for how to construct and run a ModelFlow pipeline.\n",
        "\n",
        "Our pipeline will iteratively construct an ensemble of CNN architectures to classify Fashion MNIST examples."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "cFeRpu0uQTSM"
      },
      "source": [
        "## Imports\n",
        "The necessary imports are `tensorflow` and `adanet`. We will be using `kerastuner` to generate our subnetworks."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Y-QzhAbiiCLj"
      },
      "source": [
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "c1O5y-Lukwgj"
      },
      "outputs": [],
      "source": [
        "!pip install adanet, kerastuner"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "CYBslC3HUn_f"
      },
      "outputs": [],
      "source": [
        "# Load TensorBoard magic.\n",
        "%load_ext tensorboard\n",
        "\n",
        "import time\n",
        "import kerastuner\n",
        "\n",
        "import adanet.experimental as adanet\n",
        "import numpy as np\n",
        "import tensorflow.compat.v1 as tf\n",
        "\n",
        "RANDOM_SEED = 42"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "bMLIyRDPQKm6"
      },
      "source": [
        "ModelFlow is built on TensorFlow 2.0, so make sure to enable 2.0 behavior."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "7HqIBTuL2jdo"
      },
      "outputs": [],
      "source": [
        "tf.enable_v2_behavior()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "wHGCIg28j00G"
      },
      "source": [
        "## Loading Fashion MNIST Dataset\n",
        "\n",
        "Conveniently, the dataset can be loaded easily from within TensorFlow."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "Wqlw6SiScMPt"
      },
      "outputs": [],
      "source": [
        "fashion_mnist = tf.keras.datasets.fashion_mnist\n",
        "(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "qzFdJXu4TCSe"
      },
      "source": [
        "Perform some basic preprocessing on our data and convert them to `tf.data.Dataset` objects."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "em5TEGnUjbG0"
      },
      "outputs": [],
      "source": [
        "# Normalize images.\n",
        "train_images = train_images / 255.0\n",
        "test_images = test_images / 255.0\n",
        "\n",
        "# Adding a dimension to conform to what a Keras model expects.\n",
        "train_images = np.expand_dims(train_images, -1)\n",
        "test_images = np.expand_dims(test_images, -1)\n",
        "\n",
        "# Construct datasets.\n",
        "train_dataset = tf.data.Dataset.from_tensor_slices((train_images, train_labels))\n",
        "test_dataset = tf.data.Dataset.from_tensor_slices((test_images, test_labels))\n",
        "\n",
        "# Note: we are only repeating the dataset once here for demo purposes.\n",
        "# Increasing the repetition size would improve model performance, but it would\n",
        "# also slow training.\n",
        "train_dataset = train_dataset.shuffle(1000, seed=RANDOM_SEED).batch(64).repeat(1)\n",
        "test_dataset = test_dataset.batch(64)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "ONPTCow0ng8M"
      },
      "source": [
        "## Setup Keras Tuner Model\n",
        "\n",
        "We will use [Keras Tuner](https://github.com/keras-team/keras-tuner) to define, train and tune our candidate networks."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "g_DewxJljuRe"
      },
      "outputs": [],
      "source": [
        "def build_model(hp):\n",
        "\n",
        "  # Define hyperparameter search space.\n",
        "  filters = hp.Choice('filters', values=[16, 32])\n",
        "  kernel_size = hp.Choice('kernel_size', values=[2, 3])\n",
        "  pool_size = hp.Choice('pool_size', values=[2, 4])\n",
        "  learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])\n",
        "  dropout1 = hp.Float('dropout1', min_value=0.5, max_value=0.9, step=0.1)\n",
        "  dense1_units = hp.Choice('dense1_units', values=[64, 128])\n",
        "  dropout2 = hp.Float('dropout2', min_value=0.5, max_value=0.9, step=0.1)\n",
        "\n",
        "  # Define model architecture.\n",
        "  # Credit: https://github.com/abelusha/MNIST-Fashion-CNN/blob/master/Fashon_MNIST_CNN_using_Keras_10_Runs.ipynb\n",
        "  model = tf.keras.Sequential()\n",
        "  model.add(tf.keras.layers.Conv2D(filters, kernel_size, activation='relu'))\n",
        "  model.add(tf.keras.layers.MaxPool2D())\n",
        "  model.add(tf.keras.layers.Conv2D(filters, kernel_size, activation='relu'))\n",
        "  model.add(tf.keras.layers.MaxPool2D())\n",
        "  model.add(tf.keras.layers.Dropout(dropout1, seed=RANDOM_SEED))\n",
        "  model.add(tf.keras.layers.Flatten())\n",
        "  model.add(tf.keras.layers.Dense(dense1_units, activation='relu'))\n",
        "  model.add(tf.keras.layers.Dropout(dropout2, seed=RANDOM_SEED))\n",
        "  model.add(tf.keras.layers.Dense(10, activation='softmax'))\n",
        "\n",
        "  model.compile(\n",
        "    optimizer=tf.keras.optimizers.Adam(\n",
        "      learning_rate=learning_rate),\n",
        "      loss='sparse_categorical_crossentropy',\n",
        "      metrics=['accuracy'])\n",
        "  \n",
        "  return model"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "CyiDMqXenYtF"
      },
      "source": [
        "## Define our AutoML Pipeline\n",
        "\n",
        "We define an AutoML pipeline simulating the AdaNet algorithm."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "ix1LADVPmqDr"
      },
      "outputs": [],
      "source": [
        "# Create a shared storage between AutoEnsemblePhases.\n",
        "autoensemble_storage = adanet.storages.InMemoryStorage()\n",
        "\n",
        "# Set pipeline parameters.\n",
        "# Note: We use few repetitions and max_trials for demonstration\n",
        "# purposes. Increasing these values should improve model\n",
        "# performance, but it would also slow training.\n",
        "#@title Pipeline Parameters\n",
        "max_trials = 10#@param {'type': 'integer'}\n",
        "repetitions = 2#@param {'type': 'integer'}\n",
        "\n",
        "# Define pipeline phases.\n",
        "input_phase = adanet.phases.InputPhase(train_dataset, test_dataset)\n",
        "repeat_phase = adanet.phases.RepeatPhase(\n",
        "  phase_factory=[\n",
        "    lambda: adanet.phases.KerasTunerPhase(\n",
        "      kerastuner.tuners.RandomSearch(\n",
        "        build_model,\n",
        "        objective='val_accuracy',\n",
        "        max_trials=max_trials,\n",
        "        executions_per_trial=1,\n",
        "        directory='/tmp',\n",
        "        # Make sure each KerasTunerPhase has a unique project_name.\n",
        "        project_name=\"tutorial\"+str(int(time.time())),\n",
        "        overwrite=True,\n",
        "        seed=RANDOM_SEED,\n",
        "      )\n",
        "    ),\n",
        "    lambda: adanet.phases.AutoEnsemblePhase(\n",
        "      ensemblers=[\n",
        "        adanet.phases.autoensemble_phase.MeanEnsembler(\n",
        "            'sparse_categorical_crossentropy', \n",
        "            'adam', \n",
        "            ['accuracy'])\n",
        "      ],\n",
        "      ensemble_strategies=[\n",
        "        adanet.phases.autoensemble_phase.GrowStrategy(), \n",
        "        adanet.phases.autoensemble_phase.AllStrategy(), \n",
        "      ],\n",
        "      storage=autoensemble_storage,\n",
        "      num_candidates=4\n",
        "    )\n",
        "  ],\n",
        "  repetitions=repetitions\n",
        ")\n",
        "\n",
        "controller = adanet.controllers.SequentialController([input_phase, \n",
        "                                                      repeat_phase])\n",
        "model_search = adanet.keras.ModelSearch(controller)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "bB5h2qAanv9x"
      },
      "source": [
        "## Run our AutoML Pipeline\n",
        "\n",
        "Single command to launch your pipeline."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "vZxNInWHnmSb"
      },
      "outputs": [],
      "source": [
        "model_search.run()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "SnvSpJiSdY2x"
      },
      "source": [
        "## Visualize the run in TensorBoard\n",
        "\n",
        "Setting the y-axis to wall time best illustrates the training process."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "0gH5uiLFdXOI"
      },
      "outputs": [],
      "source": [
        "%tensorboard --logdir=/tmp/ --port=0"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "sDQODD4TUZ14"
      },
      "source": [
        "## Visualize top models from run\n",
        "\n",
        "We first obtain the top model and then we print its summary. We also print the subnetwork summaries to examine the structure of the top performing ensemble.\n",
        "\n",
        "Since the resulting ensembles are just Keras models, we can use them in any context that Keras models can be used!\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "hIfe0S4KUiB4"
      },
      "outputs": [],
      "source": [
        "best_models = list(model_search.get_best_models(1))\n",
        "best_models[0].summary()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "HsfEShK5eDnQ"
      },
      "outputs": [],
      "source": [
        "for submodel in best_models[0].submodels:\n",
        "  print(submodel.summary())"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
            "name": "adanet_modelflow_tutorial.ipynb",
      "provenance": [
        {
          "file_id": "1tIdKICO3TyFknNzVr6SEF2UdbfMFq6xG",
          "timestamp": 1579278333179
        }
      ]
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
